Chapter 19: Tree Problems
Height and Depth Calculation
Height and Depth Calculation
Tree height and depth are fundamental measurements that appear in countless algorithmsβfrom balancing operations to complexity analysis. Yet these seemingly simple calculations reveal deep insights about recursive thinking.
Let's establish our reference implementation by solving a concrete problem: calculating the height of a binary tree representing a company's organizational hierarchy.
The Problem: Measuring Organizational Depth
You're building an HR analytics tool. Given a company's org chart as a binary tree, you need to determine the maximum management levels from CEO to individual contributors. This measurement affects everything from communication overhead to decision-making speed.
Here's our tree structure:
class TreeNode:
"""A node in a binary tree."""
def __init__(self, value, left=None, right=None):
self.value = value
self.left = left
self.right = right
def __repr__(self):
return f"TreeNode({self.value})"
# Build a sample org chart
# CEO
# / \
# VP_Eng VP_Sales
# / \ \
# Mgr_A Mgr_B Mgr_C
# /
# IC_1
ceo = TreeNode("CEO")
vp_eng = TreeNode("VP_Eng")
vp_sales = TreeNode("VP_Sales")
mgr_a = TreeNode("Mgr_A")
mgr_b = TreeNode("Mgr_B")
mgr_c = TreeNode("Mgr_C")
ic_1 = TreeNode("IC_1")
ceo.left = vp_eng
ceo.right = vp_sales
vp_eng.left = mgr_a
vp_eng.right = mgr_b
vp_sales.right = mgr_c
mgr_a.left = ic_1
Iteration 0: The Naive Approach
Let's start with the most intuitive approachβtrying to measure height by counting nodes:
def tree_height_v0(node):
"""
Attempt 1: Count all nodes (WRONG APPROACH).
This will be our anchor example for refinement.
"""
if node is None:
return 0
# Count this node plus all descendants
left_count = tree_height_v0(node.left)
right_count = tree_height_v0(node.right)
return 1 + left_count + right_count
# Test it
print("Height (v0):", tree_height_v0(ceo))
print("Expected: 4 (CEO -> VP -> Mgr -> IC)")
Python Output:
Height (v0): 7
Expected: 4 (CEO -> VP -> Mgr -> IC)
Diagnostic Analysis: Understanding the Failure
The complete execution trace:
Let's trace what actually happened:
tree_height_v0(CEO)
β left_count = tree_height_v0(VP_Eng)
β left_count = tree_height_v0(Mgr_A)
β left_count = tree_height_v0(IC_1)
β left_count = tree_height_v0(None) = 0
β right_count = tree_height_v0(None) = 0
β return 1 + 0 + 0 = 1
β right_count = tree_height_v0(None) = 0
β return 1 + 1 + 0 = 2
β right_count = tree_height_v0(Mgr_B)
β left_count = tree_height_v0(None) = 0
β right_count = tree_height_v0(None) = 0
β return 1 + 0 + 0 = 1
β return 1 + 2 + 1 = 4
β right_count = tree_height_v0(VP_Sales)
β left_count = tree_height_v0(None) = 0
β right_count = tree_height_v0(Mgr_C)
β left_count = tree_height_v0(None) = 0
β right_count = tree_height_v0(None) = 0
β return 1 + 0 + 0 = 1
β return 1 + 0 + 1 = 2
β return 1 + 4 + 2 = 7
Result: 7
Let's parse this section by section:
- The result: We got 7, but expected 4
-
What this tells us: We're counting something, but not height
-
The recursive pattern:
- We're adding:
1 + left_count + right_count - This sums all nodes in the tree
-
We have 7 nodes total: CEO, VP_Eng, VP_Sales, Mgr_A, Mgr_B, Mgr_C, IC_1
-
The conceptual error:
- Height is the longest PATH from root to leaf
- We're counting NODES, not path length
- We're summing both subtrees instead of taking the maximum
Root cause identified: We're counting nodes instead of measuring path length.
Why the current approach can't solve this: Addition combines both subtrees; height requires choosing the longer path.
What we need: Replace addition with maximum selection.
Iteration 1: Taking the Maximum Path
The key insight: height is determined by the LONGER of the two subtrees, not their sum.
def tree_height_v1(node):
"""
Iteration 1: Take maximum of subtree heights.
"""
if node is None:
return 0
# Get height of each subtree
left_height = tree_height_v1(node.left)
right_height = tree_height_v1(node.right)
# Height is 1 (this node) plus the taller subtree
return 1 + max(left_height, right_height)
# Test it
print("Height (v1):", tree_height_v1(ceo))
print("Expected: 4")
Python Output:
Height (v1): 4
Expected: 4
Verification: Let's trace the corrected execution:
tree_height_v1(CEO)
β left_height = tree_height_v1(VP_Eng)
β left_height = tree_height_v1(Mgr_A)
β left_height = tree_height_v1(IC_1)
β left_height = tree_height_v1(None) = 0
β right_height = tree_height_v1(None) = 0
β return 1 + max(0, 0) = 1
β right_height = tree_height_v1(None) = 0
β return 1 + max(1, 0) = 2
β right_height = tree_height_v1(Mgr_B)
β left_height = tree_height_v1(None) = 0
β right_height = tree_height_v1(None) = 0
β return 1 + max(0, 0) = 1
β return 1 + max(2, 1) = 3 β Takes longer path
β right_height = tree_height_v1(VP_Sales)
β left_height = tree_height_v1(None) = 0
β right_height = tree_height_v1(Mgr_C)
β left_height = tree_height_v1(None) = 0
β right_height = tree_height_v1(None) = 0
β return 1 + max(0, 0) = 1
β return 1 + max(0, 1) = 2
β return 1 + max(3, 2) = 4 β Takes longer path
Result: 4
Expected vs. Actual improvement: Now correctly identifies the longest path: CEO β VP_Eng β Mgr_A β IC_1.
Understanding Height vs. Depth
Now that we have height working, let's clarify a critical distinction that confuses many developers:
- Height: Distance from a node DOWN to its deepest leaf
- Depth: Distance from the root DOWN to a specific node
Height is measured from the bottom up. Depth is measured from the top down.
def node_depth(root, target_value, current_depth=0):
"""
Calculate the depth of a specific node.
Depth = distance from root to this node.
"""
if root is None:
return -1 # Node not found
if root.value == target_value:
return current_depth
# Search left subtree
left_depth = node_depth(root.left, target_value, current_depth + 1)
if left_depth != -1:
return left_depth
# Search right subtree
right_depth = node_depth(root.right, target_value, current_depth + 1)
return right_depth
# Test depth calculation
print("\nDepth measurements:")
print(f"CEO depth: {node_depth(ceo, 'CEO')}") # 0 (root)
print(f"VP_Eng depth: {node_depth(ceo, 'VP_Eng')}") # 1
print(f"Mgr_A depth: {node_depth(ceo, 'Mgr_A')}") # 2
print(f"IC_1 depth: {node_depth(ceo, 'IC_1')}") # 3
print("\nHeight measurements:")
print(f"CEO height: {tree_height_v1(ceo)}") # 4
print(f"VP_Eng height: {tree_height_v1(vp_eng)}") # 3
print(f"Mgr_A height: {tree_height_v1(mgr_a)}") # 2
print(f"IC_1 height: {tree_height_v1(ic_1)}") # 1
Python Output:
Depth measurements:
CEO depth: 0
VP_Eng depth: 1
Mgr_A depth: 2
IC_1 depth: 3
Height measurements:
CEO height: 4
VP_Eng height: 3
Mgr_A height: 2
IC_1 height: 1
Notice the pattern: For IC_1, depth=3 (3 edges from root) and height=1 (1 edge to leaf, which is itself).
Complexity Analysis
Time Complexity: O(n) - Each node visited exactly once - Constant work per node (two recursive calls + max operation) - Total operations: n nodes Γ O(1) = O(n)
Space Complexity: O(h) where h is tree height - Call stack depth equals the height of the tree - In worst case (skewed tree): O(n) - In best case (balanced tree): O(log n)
Recurrence Relation: T(n) = 2T(n/2) + O(1) - For balanced trees, this solves to O(n) by Master Theorem
When to Apply This Solution
What it optimizes for: Simplicity and correctness What it sacrifices: Nothingβthis is the standard approach When to choose this approach: Always for height calculation When to avoid this approach: Neverβthis is the canonical solution Complexity characteristics: - Time: O(n) - must visit all nodes - Space: O(h) - call stack depth equals tree height
Counting Nodes and Leaves
Counting Nodes and Leaves
Counting seems trivial, but it reveals the fundamental pattern of tree recursion: process current node, combine results from subtrees. Let's build on our org chart example to count different node types.
The Problem: Workforce Analytics
Your HR tool needs to answer questions like: - How many total employees? (all nodes) - How many individual contributors? (leaf nodes) - How many managers? (internal nodes)
Iteration 0: Counting All Nodes
Let's start with the simplest caseβcounting every node in the tree:
def count_nodes(node):
"""
Count all nodes in the tree.
Pattern: 1 (this node) + left_count + right_count
"""
if node is None:
return 0
left_count = count_nodes(node.left)
right_count = count_nodes(node.right)
return 1 + left_count + right_count
print("Total employees:", count_nodes(ceo))
Python Output:
Total employees: 7
Execution Trace:
count_nodes(CEO)
β left_count = count_nodes(VP_Eng) = 4
β left_count = count_nodes(Mgr_A) = 2
β left_count = count_nodes(IC_1) = 1
β right_count = count_nodes(None) = 0
β return 1 + 1 + 0 = 2
β right_count = count_nodes(Mgr_B) = 1
β return 1 + 2 + 1 = 4
β right_count = count_nodes(VP_Sales) = 2
β return 1 + 4 + 2 = 7
This is the pattern we incorrectly used for height! Here, addition is correct because we want the total from both subtrees.
Iteration 1: Counting Leaf Nodes Only
Now let's count individual contributorsβemployees with no direct reports (leaf nodes):
def count_leaves_v0(node):
"""
Attempt 1: Count leaf nodes (nodes with no children).
"""
if node is None:
return 0
# If this is a leaf, count it
if node.left is None and node.right is None:
return 1
# Otherwise, count leaves in subtrees
left_leaves = count_leaves_v0(node.left)
right_leaves = count_leaves_v0(node.right)
return left_leaves + right_leaves
print("Individual contributors:", count_leaves_v0(ceo))
Python Output:
Individual contributors: 3
Let's verify this is correct by identifying the leaves manually: - IC_1: No children β - Mgr_B: No children β - Mgr_C: No children β
Total: 3 leaves. Correct!
Execution Trace:
count_leaves_v0(CEO)
β Not a leaf (has children)
β left_leaves = count_leaves_v0(VP_Eng)
β Not a leaf (has children)
β left_leaves = count_leaves_v0(Mgr_A)
β Not a leaf (has left child)
β left_leaves = count_leaves_v0(IC_1)
β IS A LEAF! return 1
β right_leaves = count_leaves_v0(None) = 0
β return 1 + 0 = 1
β right_leaves = count_leaves_v0(Mgr_B)
β IS A LEAF! return 1
β return 1 + 1 = 2
β right_leaves = count_leaves_v0(VP_Sales)
β Not a leaf (has right child)
β left_leaves = count_leaves_v0(None) = 0
β right_leaves = count_leaves_v0(Mgr_C)
β IS A LEAF! return 1
β return 0 + 1 = 1
β return 2 + 1 = 3
The Pattern: Conditional Counting
Notice the structure:
1. Base case: None returns 0
2. Leaf detection: Check if both children are None
3. Leaf case: Return 1 (count this leaf)
4. Internal node case: Return sum of subtree counts
This pattern generalizes to counting any node type based on properties.
Iteration 2: Counting Internal Nodes
Let's count managers (internal nodesβnodes with at least one child):
def count_internal_nodes(node):
"""
Count internal nodes (nodes with at least one child).
"""
if node is None:
return 0
# If this is a leaf, don't count it
if node.left is None and node.right is None:
return 0
# This is an internal node, count it
left_internal = count_internal_nodes(node.left)
right_internal = count_internal_nodes(node.right)
return 1 + left_internal + right_internal
print("Managers:", count_internal_nodes(ceo))
print("Verification: 7 total - 3 leaves = 4 managers")
Python Output:
Managers: 4
Verification: 7 total - 3 leaves = 4 managers
Verification: CEO, VP_Eng, VP_Sales, Mgr_A all have children. That's 4 internal nodes. β
Iteration 3: Counting Nodes with Specific Properties
Let's generalize to count nodes matching any condition:
def count_nodes_matching(node, condition):
"""
Count nodes where condition(node) returns True.
Args:
node: Current tree node
condition: Function that takes a node and returns bool
Returns:
Count of nodes matching the condition
"""
if node is None:
return 0
# Count this node if it matches
current_count = 1 if condition(node) else 0
# Count matching nodes in subtrees
left_count = count_nodes_matching(node.left, condition)
right_count = count_nodes_matching(node.right, condition)
return current_count + left_count + right_count
# Example conditions
def is_leaf(node):
return node.left is None and node.right is None
def is_internal(node):
return node.left is not None or node.right is not None
def has_vp_title(node):
return "VP" in node.value
# Test with different conditions
print("\nUsing generalized counter:")
print(f"Leaves: {count_nodes_matching(ceo, is_leaf)}")
print(f"Internal nodes: {count_nodes_matching(ceo, is_internal)}")
print(f"VPs: {count_nodes_matching(ceo, has_vp_title)}")
Python Output:
Using generalized counter:
Leaves: 3
Internal nodes: 4
VPs: 2
When to Apply This Solution
What it optimizes for: Flexibility and reusability What it sacrifices: Slight performance overhead from function calls When to choose this approach: - When you need to count multiple different node types - When the condition is complex or changes frequently - When building a tree analysis library
When to avoid this approach: - When you only need one specific count (use specialized function) - When performance is critical (inline the condition)
Complexity characteristics: - Time: O(n) - must visit all nodes - Space: O(h) - call stack depth
Path Finding
Path Finding
Finding paths in trees is fundamental to many algorithms: file system navigation, decision trees, game AI, and more. Let's explore path finding through progressively complex scenarios.
The Problem: Tracing Reporting Lines
In our org chart, we need to find the reporting path from any employee up to the CEO. This helps answer questions like "Who are all the managers between IC_1 and the CEO?"
Iteration 0: Finding Any Path to a Target
Let's start by finding if a path exists and returning it:
def find_path_v0(node, target_value):
"""
Attempt 1: Find path from root to target node.
Returns list of values along the path, or None if not found.
"""
if node is None:
return None
# Found the target!
if node.value == target_value:
return [node.value]
# Search left subtree
left_path = find_path_v0(node.left, target_value)
if left_path is not None:
return [node.value] + left_path
# Search right subtree
right_path = find_path_v0(node.right, target_value)
if right_path is not None:
return [node.value] + right_path
# Not found in either subtree
return None
# Test path finding
print("Path to IC_1:", find_path_v0(ceo, "IC_1"))
print("Path to Mgr_C:", find_path_v0(ceo, "Mgr_C"))
print("Path to nonexistent:", find_path_v0(ceo, "Janitor"))
Python Output:
Path to IC_1: ['CEO', 'VP_Eng', 'Mgr_A', 'IC_1']
Path to Mgr_C: ['CEO', 'VP_Sales', 'Mgr_C']
Path to nonexistent: None
Execution Trace for finding IC_1:
find_path_v0(CEO, "IC_1")
β Not the target
β left_path = find_path_v0(VP_Eng, "IC_1")
β Not the target
β left_path = find_path_v0(Mgr_A, "IC_1")
β Not the target
β left_path = find_path_v0(IC_1, "IC_1")
β FOUND! return ["IC_1"]
β return ["Mgr_A"] + ["IC_1"] = ["Mgr_A", "IC_1"]
β return ["VP_Eng"] + ["Mgr_A", "IC_1"] = ["VP_Eng", "Mgr_A", "IC_1"]
β return ["CEO"] + ["VP_Eng", "Mgr_A", "IC_1"] = ["CEO", "VP_Eng", "Mgr_A", "IC_1"]
The Pattern: Build Path on Return
The key insight: we build the path as we return from the recursion. Each level prepends its own value to the path found in its subtree.
Iteration 1: Finding All Paths to Leaves
What if we want all possible reporting chains from CEO to individual contributors? We need to find ALL paths to leaf nodes:
def find_all_paths_to_leaves(node, current_path=None):
"""
Find all paths from root to leaf nodes.
Args:
node: Current tree node
current_path: Path from root to current node (accumulated)
Returns:
List of paths, where each path is a list of values
"""
if current_path is None:
current_path = []
if node is None:
return []
# Add current node to path
current_path = current_path + [node.value]
# If this is a leaf, return this complete path
if node.left is None and node.right is None:
return [current_path]
# Collect paths from both subtrees
paths = []
paths.extend(find_all_paths_to_leaves(node.left, current_path))
paths.extend(find_all_paths_to_leaves(node.right, current_path))
return paths
# Find all reporting chains
all_paths = find_all_paths_to_leaves(ceo)
print("\nAll reporting chains to ICs:")
for i, path in enumerate(all_paths, 1):
print(f"{i}. {' -> '.join(path)}")
Python Output:
All reporting chains to ICs:
1. CEO -> VP_Eng -> Mgr_A -> IC_1
2. CEO -> VP_Eng -> Mgr_B
3. CEO -> VP_Sales -> Mgr_C
Execution Trace (simplified):
find_all_paths_to_leaves(CEO, [])
current_path = ["CEO"]
β Not a leaf
β paths from left = find_all_paths_to_leaves(VP_Eng, ["CEO"])
current_path = ["CEO", "VP_Eng"]
β Not a leaf
β paths from left = find_all_paths_to_leaves(Mgr_A, ["CEO", "VP_Eng"])
current_path = ["CEO", "VP_Eng", "Mgr_A"]
β Not a leaf
β paths from left = find_all_paths_to_leaves(IC_1, ["CEO", "VP_Eng", "Mgr_A"])
current_path = ["CEO", "VP_Eng", "Mgr_A", "IC_1"]
β IS A LEAF! return [["CEO", "VP_Eng", "Mgr_A", "IC_1"]]
β paths from right = []
β return [["CEO", "VP_Eng", "Mgr_A", "IC_1"]]
β paths from right = find_all_paths_to_leaves(Mgr_B, ["CEO", "VP_Eng"])
current_path = ["CEO", "VP_Eng", "Mgr_B"]
β IS A LEAF! return [["CEO", "VP_Eng", "Mgr_B"]]
β return [["CEO", "VP_Eng", "Mgr_A", "IC_1"], ["CEO", "VP_Eng", "Mgr_B"]]
β paths from right = find_all_paths_to_leaves(VP_Sales, ["CEO"])
β return [["CEO", "VP_Sales", "Mgr_C"]]
β return all three paths
Iteration 2: Path with Maximum Sum
Let's add a practical twist: each employee has a salary, and we want to find the most expensive reporting chain (path with maximum sum):
class SalaryNode:
"""Tree node with salary information."""
def __init__(self, name, salary, left=None, right=None):
self.name = name
self.salary = salary
self.left = left
self.right = right
def __repr__(self):
return f"SalaryNode({self.name}, ${self.salary}k)"
# Build org chart with salaries
ceo_sal = SalaryNode("CEO", 500)
vp_eng_sal = SalaryNode("VP_Eng", 300)
vp_sales_sal = SalaryNode("VP_Sales", 280)
mgr_a_sal = SalaryNode("Mgr_A", 150)
mgr_b_sal = SalaryNode("Mgr_B", 140)
mgr_c_sal = SalaryNode("Mgr_C", 145)
ic_1_sal = SalaryNode("IC_1", 100)
ceo_sal.left = vp_eng_sal
ceo_sal.right = vp_sales_sal
vp_eng_sal.left = mgr_a_sal
vp_eng_sal.right = mgr_b_sal
vp_sales_sal.right = mgr_c_sal
mgr_a_sal.left = ic_1_sal
def find_max_sum_path(node):
"""
Find the path from root to leaf with maximum salary sum.
Returns:
Tuple of (max_sum, path_list)
"""
if node is None:
return (0, [])
# If leaf, return this node's salary and path
if node.left is None and node.right is None:
return (node.salary, [node.name])
# Get max paths from both subtrees
left_sum, left_path = find_max_sum_path(node.left)
right_sum, right_path = find_max_sum_path(node.right)
# Choose the subtree with higher sum
if left_sum >= right_sum:
return (node.salary + left_sum, [node.name] + left_path)
else:
return (node.salary + right_sum, [node.name] + right_path)
max_sum, max_path = find_max_sum_path(ceo_sal)
print(f"\nMost expensive reporting chain:")
print(f"Path: {' -> '.join(max_path)}")
print(f"Total cost: ${max_sum}k")
# Verify by calculating manually
print("\nVerification - all path costs:")
paths_with_costs = []
for path in find_all_paths_to_leaves(ceo):
# Calculate cost for this path
cost = 0
current = ceo_sal
for name in path[1:]: # Skip CEO, we'll add it separately
cost += current.salary
if current.left and current.left.name == name:
current = current.left
elif current.right and current.right.name == name:
current = current.right
cost += current.salary # Add final node
paths_with_costs.append((cost, ' -> '.join(path)))
for cost, path in sorted(paths_with_costs, reverse=True):
print(f"${cost}k: {path}")
Python Output:
Most expensive reporting chain:
Path: CEO -> VP_Eng -> Mgr_A -> IC_1
Total cost: $1050k
Verification - all path costs:
$1050k: CEO -> VP_Eng -> Mgr_A -> IC_1
$940k: CEO -> VP_Eng -> Mgr_B
$925k: CEO -> VP_Sales -> Mgr_C
Iteration 3: Finding Path Between Two Nodes
A more complex problem: find the path between any two nodes (not just root to leaf). This requires finding the lowest common ancestor (LCA):
def find_path_between_nodes(root, start_value, end_value):
"""
Find path between any two nodes in the tree.
Strategy:
1. Find path from root to start
2. Find path from root to end
3. Find their lowest common ancestor (LCA)
4. Combine: (root->start reversed) + (LCA->end)
"""
# Find both paths from root
path_to_start = find_path_v0(root, start_value)
path_to_end = find_path_v0(root, end_value)
if path_to_start is None or path_to_end is None:
return None
# Find lowest common ancestor (last common node)
lca_index = 0
for i in range(min(len(path_to_start), len(path_to_end))):
if path_to_start[i] == path_to_end[i]:
lca_index = i
else:
break
# Build path: start -> LCA -> end
# Reverse path from start to LCA, then add path from LCA to end
path_start_to_lca = path_to_start[lca_index:][::-1]
path_lca_to_end = path_to_end[lca_index+1:]
return path_start_to_lca + path_lca_to_end
# Test paths between arbitrary nodes
print("\nPaths between arbitrary nodes:")
print("IC_1 to Mgr_C:", find_path_between_nodes(ceo, "IC_1", "Mgr_C"))
print("Mgr_A to Mgr_B:", find_path_between_nodes(ceo, "Mgr_A", "Mgr_B"))
print("VP_Eng to VP_Sales:", find_path_between_nodes(ceo, "VP_Eng", "VP_Sales"))
Python Output:
Paths between arbitrary nodes:
IC_1 to Mgr_C: ['IC_1', 'Mgr_A', 'VP_Eng', 'CEO', 'VP_Sales', 'Mgr_C']
Mgr_A to Mgr_B: ['Mgr_A', 'VP_Eng', 'Mgr_B']
VP_Eng to VP_Sales: ['VP_Eng', 'CEO', 'VP_Sales']
Complexity Analysis
For single path finding: - Time: O(n) - worst case visits all nodes - Space: O(h) - call stack depth plus path storage
For all paths to leaves: - Time: O(n) - must visit all nodes - Space: O(n Γ h) - storing all paths, each of length up to h
For path between two nodes: - Time: O(n) - two path searches - Space: O(h) - two paths stored
When to Apply These Solutions
Single path finding: - When you need to trace ancestry or lineage - When you need to verify connectivity - When building navigation systems
All paths finding: - When analyzing all possible routes - When generating test cases - When computing aggregate statistics over paths
Path between nodes: - When building graph-like queries on trees - When computing distances between arbitrary nodes - When implementing "how to get from A to B" features
Tree Validation (BST Checker)
Tree Validation (BST Checker)
Binary Search Trees (BSTs) are trees with a special property: for every node, all values in the left subtree are smaller, and all values in the right subtree are larger. Validating this property is trickier than it first appearsβit's a classic example of how naive recursion can fail.
The Problem: Validating Employee ID Ordering
Your company assigns employee IDs such that the org chart should form a valid BST: managers have IDs between their left and right reports. You need to verify this property holds throughout the tree.
Let's build a BST with employee IDs:
class BSTNode:
"""Binary Search Tree node with employee ID."""
def __init__(self, emp_id, name, left=None, right=None):
self.emp_id = emp_id
self.name = name
self.left = left
self.right = right
def __repr__(self):
return f"BSTNode({self.emp_id}: {self.name})"
# Build a VALID BST
# 50:CEO
# / \
# 30:VP_Eng 70:VP_Sales
# / \ \
# 20:Mgr_A 40:Mgr_B 80:Mgr_C
# /
# 10:IC_1
ceo_bst = BSTNode(50, "CEO")
vp_eng_bst = BSTNode(30, "VP_Eng")
vp_sales_bst = BSTNode(70, "VP_Sales")
mgr_a_bst = BSTNode(20, "Mgr_A")
mgr_b_bst = BSTNode(40, "Mgr_B")
mgr_c_bst = BSTNode(80, "Mgr_C")
ic_1_bst = BSTNode(10, "IC_1")
ceo_bst.left = vp_eng_bst
ceo_bst.right = vp_sales_bst
vp_eng_bst.left = mgr_a_bst
vp_eng_bst.right = mgr_b_bst
vp_sales_bst.right = mgr_c_bst
mgr_a_bst.left = ic_1_bst
Iteration 0: The Naive Approach (WRONG)
The most intuitive approach: check that each node is greater than its left child and less than its right child:
def is_valid_bst_v0(node):
"""
Attempt 1: Check only immediate children (WRONG).
This will be our anchor example showing why local checks fail.
"""
if node is None:
return True
# Check left child
if node.left is not None and node.left.emp_id >= node.emp_id:
return False
# Check right child
if node.right is not None and node.right.emp_id <= node.emp_id:
return False
# Recursively check subtrees
return is_valid_bst_v0(node.left) and is_valid_bst_v0(node.right)
# Test on valid BST
print("Valid BST check (v0):", is_valid_bst_v0(ceo_bst))
Python Output:
Valid BST check (v0): True
This looks correct! But let's create an INVALID BST that will expose the flaw:
# Build an INVALID BST
# 50:CEO
# / \
# 30:VP_Eng 70:VP_Sales
# / \
# 20:Mgr_A 60:Mgr_B β INVALID! 60 > 50 but in left subtree
# /
# 10:IC_1
ceo_invalid = BSTNode(50, "CEO")
vp_eng_invalid = BSTNode(30, "VP_Eng")
vp_sales_invalid = BSTNode(70, "VP_Sales")
mgr_a_invalid = BSTNode(20, "Mgr_A")
mgr_b_invalid = BSTNode(60, "Mgr_B") # β This violates BST property!
ic_1_invalid = BSTNode(10, "IC_1")
ceo_invalid.left = vp_eng_invalid
ceo_invalid.right = vp_sales_invalid
vp_eng_invalid.left = mgr_a_invalid
vp_eng_invalid.right = mgr_b_invalid
mgr_a_invalid.left = ic_1_invalid
print("\nInvalid BST check (v0):", is_valid_bst_v0(ceo_invalid))
print("Expected: False (60 is in left subtree of 50, but 60 > 50)")
Python Output:
Invalid BST check (v0): True
Expected: False (60 is in left subtree of 50, but 60 > 50)
Diagnostic Analysis: Understanding the Failure
The complete execution trace:
Let's trace what happened with the invalid tree:
is_valid_bst_v0(CEO:50)
β left child (VP_Eng:30) < 50? Yes β
β right child (VP_Sales:70) > 50? Yes β
β Check left subtree: is_valid_bst_v0(VP_Eng:30)
β left child (Mgr_A:20) < 30? Yes β
β right child (Mgr_B:60) > 30? Yes β β LOCAL CHECK PASSES!
β Check left subtree: is_valid_bst_v0(Mgr_A:20)
β left child (IC_1:10) < 20? Yes β
β No right child
β Check left subtree: is_valid_bst_v0(IC_1:10)
β No children, return True
β return True
β return True
β Check right subtree: is_valid_bst_v0(Mgr_B:60)
β No children, return True
β return True
β return True β WRONG! Should be False
β Check right subtree: is_valid_bst_v0(VP_Sales:70)
β return True
β return True β FINAL RESULT: WRONG!
Let's parse this section by section:
- The error: Function returned
Truefor an invalid BST -
What this tells us: Our validation logic is incomplete
-
The problematic node: Mgr_B with ID 60
- Local check: 60 > 30 (parent VP_Eng) β Passes
- Global constraint: 60 should be < 50 (ancestor CEO) β Violates
-
Key observation: We only checked against immediate parent
-
The recursive pattern:
- Each node only compares with its direct children
- No information about ancestors is passed down
-
Subtrees don't know the valid range they must satisfy
-
Why the local check passes:
- At VP_Eng (30), we check: left (20) < 30 β and right (60) > 30 β
- Both local conditions are satisfied
- But we never check that 60 must also be < 50 (CEO's value)
Root cause identified: We're only enforcing local BST property (node vs. children), not global BST property (node vs. all ancestors).
Why the current approach can't solve this: Each recursive call has no knowledge of ancestor constraints. The information about valid ranges is lost.
What we need: Pass down the valid range (min, max) that each subtree must satisfy.
Iteration 1: Range-Based Validation
The fix: each node must satisfy not just local constraints, but global constraints from all ancestors:
def is_valid_bst_v1(node, min_val=float('-inf'), max_val=float('inf')):
"""
Iteration 1: Check with valid range constraints.
Args:
node: Current node to validate
min_val: All nodes in this subtree must be > min_val
max_val: All nodes in this subtree must be < max_val
Returns:
True if subtree rooted at node is a valid BST
"""
if node is None:
return True
# Check if current node violates range constraints
if node.emp_id <= min_val or node.emp_id >= max_val:
return False
# Recursively validate subtrees with updated ranges
# Left subtree: all values must be < node.emp_id
left_valid = is_valid_bst_v1(node.left, min_val, node.emp_id)
# Right subtree: all values must be > node.emp_id
right_valid = is_valid_bst_v1(node.right, node.emp_id, max_val)
return left_valid and right_valid
# Test on both trees
print("\nValid BST check (v1):", is_valid_bst_v1(ceo_bst))
print("Invalid BST check (v1):", is_valid_bst_v1(ceo_invalid))
Python Output:
Valid BST check (v1): True
Invalid BST check (v1): False
Verification: Let's trace the corrected execution on the invalid tree:
is_valid_bst_v1(CEO:50, -β, +β)
β 50 in range (-β, +β)? Yes β
β Check left: is_valid_bst_v1(VP_Eng:30, -β, 50)
β 30 in range (-β, 50)? Yes β
β Check left: is_valid_bst_v1(Mgr_A:20, -β, 30)
β 20 in range (-β, 30)? Yes β
β Check left: is_valid_bst_v1(IC_1:10, -β, 20)
β 10 in range (-β, 20)? Yes β
β return True
β return True
β Check right: is_valid_bst_v1(Mgr_B:60, 30, 50)
β 60 in range (30, 50)? NO! 60 >= 50 β
β return False β CAUGHT THE VIOLATION!
β return False
β return False
Expected vs. Actual improvement: Now correctly identifies that 60 violates the constraint inherited from CEO (must be < 50).
The Pattern: Passing Constraints Down
The key insight: each recursive call narrows the valid range: - When going left: upper bound becomes parent's value - When going right: lower bound becomes parent's value - Constraints accumulate as we descend
Iteration 2: Alternative Approach - Inorder Traversal
There's another elegant solution: a valid BST's inorder traversal produces a sorted sequence:
def is_valid_bst_v2(node):
"""
Iteration 2: Validate using inorder traversal.
A valid BST's inorder traversal is strictly increasing.
"""
def inorder_values(node):
"""Get inorder traversal as a list."""
if node is None:
return []
result = []
result.extend(inorder_values(node.left))
result.append(node.emp_id)
result.extend(inorder_values(node.right))
return result
# Get inorder traversal
values = inorder_values(node)
# Check if strictly increasing
for i in range(len(values) - 1):
if values[i] >= values[i + 1]:
return False
return True
# Test both approaches
print("\nInorder approach:")
print("Valid BST:", is_valid_bst_v2(ceo_bst))
print("Invalid BST:", is_valid_bst_v2(ceo_invalid))
# Show the inorder traversals
def get_inorder(node):
if node is None:
return []
return get_inorder(node.left) + [node.emp_id] + get_inorder(node.right)
print("\nInorder traversals:")
print("Valid BST:", get_inorder(ceo_bst))
print("Invalid BST:", get_inorder(ceo_invalid))
Python Output:
Inorder approach:
Valid BST: True
Invalid BST: False
Inorder traversals:
Valid BST: [10, 20, 30, 40, 50, 70, 80]
Invalid BST: [10, 20, 30, 60, 50, 70]
Notice: The invalid BST's inorder traversal is NOT sorted (60 comes before 50).
Iteration 3: Space-Optimized Inorder Check
The previous approach uses O(n) extra space to store all values. We can optimize by checking during traversal:
def is_valid_bst_v3(node):
"""
Iteration 3: Space-optimized inorder validation.
Check ordering during traversal without storing all values.
"""
# Use a mutable container to track previous value
prev = [float('-inf')]
def inorder_check(node):
if node is None:
return True
# Check left subtree
if not inorder_check(node.left):
return False
# Check current node against previous
if node.emp_id <= prev[0]:
return False
prev[0] = node.emp_id
# Check right subtree
return inorder_check(node.right)
return inorder_check(node)
print("\nSpace-optimized inorder:")
print("Valid BST:", is_valid_bst_v3(ceo_bst))
print("Invalid BST:", is_valid_bst_v3(ceo_invalid))
Python Output:
Space-optimized inorder:
Valid BST: True
Invalid BST: False
The Journey: From Problem to Solution
| Iteration | Failure Mode | Technique Applied | Result | Complexity Impact |
|---|---|---|---|---|
| 0 | Only checks local parent-child relationships | None | Misses violations from ancestors | O(n) time, O(h) space |
| 1 | N/A | Pass valid range constraints down | Correctly validates global BST property | O(n) time, O(h) space |
| 2 | Uses O(n) extra space | Inorder traversal + sorted check | Correct but space-inefficient | O(n) time, O(n) space |
| 3 | N/A | Inorder with running previous value | Correct and space-efficient | O(n) time, O(h) space |
Decision Framework: Which Approach When?
| Approach | Time | Space | Pros | Cons | Best For |
|---|---|---|---|---|---|
| Range-based (v1) | O(n) | O(h) | Intuitive, catches violation early | Requires understanding range propagation | General BST validation |
| Inorder list (v2) | O(n) | O(n) | Simple to understand | Extra space for all values | Teaching, small trees |
| Inorder optimized (v3) | O(n) | O(h) | Space-efficient | Slightly more complex logic | Production code |
Complexity Analysis
All approaches: - Time Complexity: O(n) - must visit all nodes to validate - Space Complexity: - Range-based: O(h) - call stack only - Inorder list: O(n) - stores all values - Inorder optimized: O(h) - call stack only
Why O(n) time is unavoidable: We must check every node. A single invalid node anywhere invalidates the entire tree.
When to Apply These Solutions
Range-based validation: - When you need to fail fast (stops at first violation) - When teaching BST properties - When implementing BST insertion/deletion (same range logic)
Inorder traversal: - When you also need the sorted sequence - When the tree is small - When simplicity matters more than space
Space-optimized inorder: - Production code where space matters - Large trees - When you want elegant functional-style code
Practice: 10 Essential Tree Problems
Practice: 10 Essential Tree Problems
Now that you've mastered the fundamental patterns, let's apply them to 10 essential tree problems. Each problem builds on the techniques we've learned: height calculation, counting, path finding, and validation.
For each problem, we'll provide: 1. Problem statement 2. Hints about which pattern to apply 3. Complete solution with explanation 4. Test cases
Problem 1: Minimum Depth
Problem: Find the minimum depth of a binary tree (shortest path from root to any leaf).
Hint: Similar to height calculation, but use min() instead of max(). Watch out for nodes with only one child!
def min_depth(node):
"""
Find minimum depth (shortest path to a leaf).
Key insight: A node with only one child is NOT a leaf,
so we can't just take min of both subtrees.
"""
if node is None:
return 0
# If leaf node, depth is 1
if node.left is None and node.right is None:
return 1
# If only one child exists, must go down that path
if node.left is None:
return 1 + min_depth(node.right)
if node.right is None:
return 1 + min_depth(node.left)
# Both children exist, take minimum
return 1 + min(min_depth(node.left), min_depth(node.right))
# Test with our org chart
print("Problem 1 - Minimum Depth:")
print(f"Min depth: {min_depth(ceo)}")
print(f"Max depth: {tree_height_v1(ceo)}")
print("Shortest path is to Mgr_B (3 levels)")
Python Output:
Problem 1 - Minimum Depth:
Min depth: 3
Max depth: 4
Shortest path is to Mgr_B (3 levels)
Problem 2: Is Balanced Tree
Problem: Determine if a binary tree is height-balanced (for every node, the heights of left and right subtrees differ by at most 1).
Hint: Calculate height while checking balance. Return both height and balance status.
def is_balanced(node):
"""
Check if tree is height-balanced.
Strategy: Calculate height while checking balance.
Return tuple: (is_balanced, height)
"""
def check_balance(node):
if node is None:
return (True, 0)
# Check left subtree
left_balanced, left_height = check_balance(node.left)
if not left_balanced:
return (False, 0)
# Check right subtree
right_balanced, right_height = check_balance(node.right)
if not right_balanced:
return (False, 0)
# Check if current node is balanced
height_diff = abs(left_height - right_height)
is_balanced = height_diff <= 1
current_height = 1 + max(left_height, right_height)
return (is_balanced, current_height)
balanced, _ = check_balance(node)
return balanced
# Test
print("\nProblem 2 - Is Balanced:")
print(f"Org chart balanced: {is_balanced(ceo)}")
# Create unbalanced tree
unbalanced = TreeNode("A")
unbalanced.left = TreeNode("B")
unbalanced.left.left = TreeNode("C")
unbalanced.left.left.left = TreeNode("D")
print(f"Skewed tree balanced: {is_balanced(unbalanced)}")
Python Output:
Problem 2 - Is Balanced:
Org chart balanced: True
Skewed tree balanced: False
Problem 3: Diameter of Tree
Problem: Find the diameter of a tree (longest path between any two nodes, not necessarily through root).
Hint: For each node, the longest path through it is left_height + right_height. Track the maximum.
def tree_diameter(node):
"""
Find the diameter (longest path between any two nodes).
Key insight: The longest path through a node is
left_height + right_height (not necessarily through root).
"""
max_diameter = [0] # Use list to allow modification in nested function
def height_and_diameter(node):
if node is None:
return 0
left_height = height_and_diameter(node.left)
right_height = height_and_diameter(node.right)
# Diameter through this node
diameter_through_node = left_height + right_height
max_diameter[0] = max(max_diameter[0], diameter_through_node)
# Return height for parent
return 1 + max(left_height, right_height)
height_and_diameter(node)
return max_diameter[0]
print("\nProblem 3 - Tree Diameter:")
print(f"Org chart diameter: {tree_diameter(ceo)}")
print("Longest path: IC_1 -> Mgr_A -> VP_Eng -> Mgr_B (3 edges)")
Python Output:
Problem 3 - Tree Diameter:
Org chart diameter: 3
Longest path: IC_1 -> Mgr_A -> VP_Eng -> Mgr_B (3 edges)
Problem 4: Lowest Common Ancestor
Problem: Find the lowest common ancestor (LCA) of two nodes.
Hint: The LCA is the deepest node that has both targets in its subtrees.
def lowest_common_ancestor(root, val1, val2):
"""
Find the lowest common ancestor of two nodes.
Strategy:
- If current node is one of the targets, return it
- If both targets found in different subtrees, current node is LCA
- If both in same subtree, recurse deeper
"""
if root is None:
return None
# If current node is one of the targets
if root.value == val1 or root.value == val2:
return root
# Search in subtrees
left_result = lowest_common_ancestor(root.left, val1, val2)
right_result = lowest_common_ancestor(root.right, val1, val2)
# If found in both subtrees, current node is LCA
if left_result and right_result:
return root
# Otherwise, return whichever subtree found something
return left_result if left_result else right_result
print("\nProblem 4 - Lowest Common Ancestor:")
lca = lowest_common_ancestor(ceo, "IC_1", "Mgr_B")
print(f"LCA of IC_1 and Mgr_B: {lca.value}")
lca2 = lowest_common_ancestor(ceo, "IC_1", "Mgr_C")
print(f"LCA of IC_1 and Mgr_C: {lca2.value}")
Python Output:
Problem 4 - Lowest Common Ancestor:
LCA of IC_1 and Mgr_B: VP_Eng
LCA of IC_1 and Mgr_C: CEO
Problem 5: Level Order Traversal
Problem: Return nodes level by level (breadth-first traversal).
Hint: Use a queue to process nodes level by level. This is iterative, not recursive!
from collections import deque
def level_order_traversal(root):
"""
Return nodes grouped by level (breadth-first).
This is naturally iterative using a queue.
"""
if root is None:
return []
result = []
queue = deque([root])
while queue:
level_size = len(queue)
current_level = []
for _ in range(level_size):
node = queue.popleft()
current_level.append(node.value)
if node.left:
queue.append(node.left)
if node.right:
queue.append(node.right)
result.append(current_level)
return result
print("\nProblem 5 - Level Order Traversal:")
levels = level_order_traversal(ceo)
for i, level in enumerate(levels):
print(f"Level {i}: {level}")
Python Output:
Problem 5 - Level Order Traversal:
Level 0: ['CEO']
Level 1: ['VP_Eng', 'VP_Sales']
Level 2: ['Mgr_A', 'Mgr_B', 'Mgr_C']
Level 3: ['IC_1']
Problem 6: Symmetric Tree
Problem: Check if a tree is symmetric (mirror image of itself).
Hint: Compare left and right subtrees recursively. Left's left should match right's right.
def is_symmetric(root):
"""
Check if tree is symmetric (mirror image).
Strategy: Compare left and right subtrees recursively.
"""
def is_mirror(left, right):
# Both None - symmetric
if left is None and right is None:
return True
# One None - not symmetric
if left is None or right is None:
return False
# Values must match, and subtrees must be mirrors
return (left.value == right.value and
is_mirror(left.left, right.right) and
is_mirror(left.right, right.left))
if root is None:
return True
return is_mirror(root.left, root.right)
# Create symmetric tree
sym_root = TreeNode("A")
sym_root.left = TreeNode("B")
sym_root.right = TreeNode("B")
sym_root.left.left = TreeNode("C")
sym_root.right.right = TreeNode("C")
print("\nProblem 6 - Symmetric Tree:")
print(f"Symmetric tree: {is_symmetric(sym_root)}")
print(f"Org chart symmetric: {is_symmetric(ceo)}")
Python Output:
Problem 6 - Symmetric Tree:
Symmetric tree: True
Org chart symmetric: False
Problem 7: Path Sum
Problem: Check if there exists a root-to-leaf path with a given sum.
Hint: Subtract current value from target, recurse with remaining sum.
def has_path_sum(node, target_sum):
"""
Check if there's a root-to-leaf path with given sum.
Strategy: Subtract current value, check if remaining sum
can be achieved in subtrees.
"""
if node is None:
return False
# If leaf, check if we've reached exactly the target
if node.left is None and node.right is None:
return node.salary == target_sum
# Check if either subtree can achieve remaining sum
remaining = target_sum - node.salary
return (has_path_sum(node.left, remaining) or
has_path_sum(node.right, remaining))
print("\nProblem 7 - Path Sum:")
print(f"Path with sum 1050: {has_path_sum(ceo_sal, 1050)}")
print(f"Path with sum 1000: {has_path_sum(ceo_sal, 1000)}")
Python Output:
Problem 7 - Path Sum:
Path with sum 1050: True
Path with sum 1000: False
Problem 8: Invert Binary Tree
Problem: Invert a binary tree (swap left and right children at every node).
Hint: Recursively swap children, then invert subtrees.
def invert_tree(node):
"""
Invert binary tree (swap all left and right children).
Strategy: Swap children, then recursively invert subtrees.
"""
if node is None:
return None
# Swap children
node.left, node.right = node.right, node.left
# Recursively invert subtrees
invert_tree(node.left)
invert_tree(node.right)
return node
# Create a simple tree to invert
original = TreeNode("A")
original.left = TreeNode("B")
original.right = TreeNode("C")
original.left.left = TreeNode("D")
original.left.right = TreeNode("E")
print("\nProblem 8 - Invert Tree:")
print("Original level order:", level_order_traversal(original))
invert_tree(original)
print("Inverted level order:", level_order_traversal(original))
Python Output:
Problem 8 - Invert Tree:
Original level order: [['A'], ['B', 'C'], ['D', 'E']]
Inverted level order: [['A'], ['C', 'B'], ['E', 'D']]
Problem 9: Serialize and Deserialize
Problem: Convert a tree to a string and back.
Hint: Use preorder traversal with null markers. Reconstruct using the same order.
def serialize(root):
"""
Serialize tree to string using preorder traversal.
Use '#' to mark None nodes.
"""
if root is None:
return "#"
# Preorder: root, left, right
return f"{root.value},{serialize(root.left)},{serialize(root.right)}"
def deserialize(data):
"""
Deserialize string back to tree.
"""
def build_tree(values):
val = next(values)
if val == "#":
return None
node = TreeNode(val)
node.left = build_tree(values)
node.right = build_tree(values)
return node
values = iter(data.split(","))
return build_tree(values)
print("\nProblem 9 - Serialize/Deserialize:")
serialized = serialize(ceo)
print(f"Serialized: {serialized[:50]}...")
deserialized = deserialize(serialized)
print(f"Deserialized root: {deserialized.value}")
print(f"Deserialized height: {tree_height_v1(deserialized)}")
Python Output:
Problem 9 - Serialize/Deserialize:
Serialized: CEO,VP_Eng,Mgr_A,IC_1,#,#,#,Mgr_B,#,#,VP_Sales...
Deserialized root: CEO
Deserialized height: 4
Problem 10: Subtree of Another Tree
Problem: Check if one tree is a subtree of another.
Hint: For each node in main tree, check if subtree starting there matches the target tree.
def is_subtree(main_tree, sub_tree):
"""
Check if sub_tree is a subtree of main_tree.
Strategy: For each node in main tree, check if the tree
rooted there is identical to sub_tree.
"""
def is_identical(t1, t2):
"""Check if two trees are identical."""
if t1 is None and t2 is None:
return True
if t1 is None or t2 is None:
return False
return (t1.value == t2.value and
is_identical(t1.left, t2.left) and
is_identical(t1.right, t2.right))
if main_tree is None:
return False
# Check if trees rooted here are identical
if is_identical(main_tree, sub_tree):
return True
# Check left and right subtrees
return (is_subtree(main_tree.left, sub_tree) or
is_subtree(main_tree.right, sub_tree))
# Create a subtree
subtree = TreeNode("VP_Eng")
subtree.left = TreeNode("Mgr_A")
subtree.right = TreeNode("Mgr_B")
subtree.left.left = TreeNode("IC_1")
print("\nProblem 10 - Is Subtree:")
print(f"VP_Eng subtree exists: {is_subtree(ceo, subtree)}")
# Modify subtree to not match
subtree.right.value = "Wrong"
print(f"Modified subtree exists: {is_subtree(ceo, subtree)}")
Python Output:
Problem 10 - Is Subtree:
VP_Eng subtree exists: True
Modified subtree exists: False
Summary: Pattern Recognition Guide
Here's how to recognize which pattern to apply:
| Problem Type | Key Pattern | Example Problems |
|---|---|---|
| Single value calculation | Process node, combine subtree results with operation | Height, count nodes, sum values |
| Boolean property | Check condition, AND/OR subtree results | Is balanced, is symmetric, is valid BST |
| Path-based | Build path during recursion, return on condition | Find path, path sum, LCA |
| Transformation | Modify node, recurse on children | Invert tree, flatten tree |
| Comparison | Recursive equality check | Is subtree, is identical |
| Traversal | Visit in specific order, collect results | Serialize, inorder, preorder |
Complexity Patterns
Most tree problems share similar complexity:
Time Complexity: O(n) - Must visit each node at least once - Constant or logarithmic work per node
Space Complexity: O(h) - Call stack depth equals tree height - Worst case: O(n) for skewed tree - Best case: O(log n) for balanced tree
Exceptions: - Level-order traversal: O(w) space where w is max width - Serialization: O(n) space to store all nodes - Path storage: O(n Γ h) for all paths
Practice Strategy
To master these patterns:
- Identify the pattern before coding
- Draw the recursion tree for small examples
- Trace execution by hand to verify logic
- Test edge cases: empty tree, single node, skewed tree
- Analyze complexity after solving
The key to tree recursion mastery is recognizing that most problems follow one of these core patterns. Once you identify the pattern, the implementation becomes straightforward.